Resource Monitoring in a Distributed Storage System

ABSTRACT

A distributed electronic storage system (DESS) comprises a plurality of computing devices communicatively coupled via one or more network links and having a file system distributed among them. The DESS comprises management circuitry that resides on the first computing device. The management circuitry is operable to generate an indication of a load on a first resource that resides on the first computing device. The management circuitry is operable to receive, via the one or more network links, an indication of a load on a second resource that resides on a second computing device of the plurality of computing devices. The management circuitry is operable to determine a condition of the DESS based on the indication of the load on the first resource and the indication of the load on the second resource.

PRIORITY CLAIM

This application claims priority to the following application(s), eachof which is hereby incorporated herein by reference:

U.S. provisional patent application 62/288,106 titled “CongestionMitigation in a Distributed Storage System” filed on Jan. 28, 2016.

INCORPORATION BY REFERENCE

Each of the following documents is hereby incorporated herein byreference in its entirety:

U.S. patent application Ser. No. 14/789,422 titled “Virtual File SystemSupporting Multi-Tiered Storage” and filed on Jul. 1, 2015;U.S. patent application Ser. No. 14/833,053 titled “Distributed ErasureCoded Virtual File System” and filed on Aug. 22, 2015;U.S. patent application Ser. No. ______titled “Congestion Mitigation ina Distributed Storage System” (Attorney Docket 60304US02) and filed onthe same date as this application.

BACKGROUND

Limitations and disadvantages of conventional approaches to data storagewill become apparent to one of skill in the art, through comparison ofsuch approaches with some aspects of the present method and system setforth in the remainder of this disclosure with reference to thedrawings.

BRIEF SUMMARY

Methods and systems are provided for congestion mitigation in adistributed storage system substantially as illustrated by and/ordescribed in connection with at least one of the figures, as set forthmore completely in the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates various example configurations of a distributedelectronic storage system (DESS) in accordance with aspects of thisdisclosure.

FIG. 2 illustrates various example configurations of a compute node thatuses a distributed electronic storage system in accordance with aspectsof this disclosure.

FIG. 3 illustrates various example configurations of a distributedelectronic storage system node in accordance with aspects of thisdisclosure.

FIG. 4 illustrates various example configurations of a dedicated storagenode in accordance with aspects of this disclosure.

FIG. 5A illustrates an example implementation of a node configured forcongestion mitigation in accordance with aspects of this disclosure.

FIG. 5B is a flowchart illustrating an example process for congestionmitigation performed by the node of FIG. 5A.

FIG. 6 illustrates another example implementation of a node configuredfor congestion mitigation in accordance with aspects of this disclosure.

FIG. 7 illustrates another example implementation of a node configuredfor congestion mitigation in accordance with aspects of this disclosure.

FIGS. 8A-8C illustrate an example initialization and update of resourceload levels in a DESS, in accordance with aspects of this disclosure.

FIG. 9 illustrates aspects of a DESS configured to perform resource loadmonitoring.

FIG. 10 is a flowchart illustrating an example process of resource loadlogging and reporting in a DESS.

FIG. 11 is a flowchart illustrating an example process for automaticprovisioning and de-provisioning of DESS resources.

FIG. 12 is a block diagram illustrating configuration of a DESS from anon-transitory machine-readable storage media.

DETAILED DESCRIPTION

FIG. 1 illustrates various example configurations of a distributedelectronic storage system in accordance with aspects of this disclosure.Shown in FIG. 1 is a local area network (LAN) 102 comprising one or moredistributed electronic storage system (DESS) nodes 120 (indexed byintegers from 1 to J, for j≥1), and optionally comprising (indicated bydashed lines): one or more dedicated storage nodes 106 (indexed byintegers from 1 to M, for M≥1), one or more compute nodes 104 (indexedby integers from 1 to N, for N≥1), and/or an edge router 110 thatconnects the LAN 102 to a remote network 118. The remote network 118optionally comprises one or more storage services 114 (indexed byintegers from 1 to K, for K≥1), and/or one or more dedicated storagenodes 115 (indexed by integers from 1 to L, for L≥1). The nodes of theLAN 102 are communicatively coupled via interconnect 101 (e.g., coppercables, fiber cables, wireless links, switches, bridges, hubs, and/orthe like).

Each compute node 104 _(n) (n an integer, where 1≤n≤N) is a networkedcomputing device (e.g., a server, personal computer, or the like) thatcomprises circuitry for running a variety of client processes (eitherdirectly on an operating system of the node 104 _(n) and/or in one ormore virtual machines/containers running on the device 104 _(n)) and forinterfacing with one or more DESS nodes 120. As used in this disclosure,a “client process” is a process that reads data from storage and/orwrites data to storage in the course of performing its primary function,but whose primary function is not storage-related (i.e., the process isonly concerned that its data is reliably stored and retrievable whenneeded, and not concerned with where, when, or how the data is stored).Example applications which give rise to such processes include: an emailserver application, a web server application, office productivityapplications, customer relationship management (CRM) applications, andenterprise resource planning (ERP) applications, just to name a few.Example configurations of a compute node 104 _(n) are described belowwith reference to FIG. 2 .

Each DESS node 120 _(j) (j an integer, where 1≤j≤J) is a networkedcomputing device (e.g., a server, personal computer, or the like) thatcomprises circuitry for running DESS processes and, optionally, clientprocesses (either directly on an operating system of the device 104 _(n)and/or in one or more virtual machines running in the device 104 _(n)).As used in this disclosure, a “DESS process” is a process thatimplements aspects of one or more of: the DESS driver, the DESS frontend, the DESS back end, the DESS memory controller, the DESSadministrator, the DESS provisioner, and the DESS logger/monitordescribed below in this disclosure (any one or more of which mayimplement one or more choking processes, as described below). Exampleconfigurations of a DESS node 120 _(j) are described below withreference to FIG. 3 . Thus, in an example implementation, resources(e.g., processing and memory resources) of the DESS node 120 _(j) may beshared among client processes and DESS processes. The processes of theDESS may be configured to demand relatively small amounts of theresources to minimize the impact on the performance of the clientprocesses. From the perspective of the client process(es), the interfacewith the DESS may be independent of the particular physical machine(s)on which the DESS process(es) are running Example configurations of aDESS node 120 _(j) are described below with reference to FIG. 3 .

Each on-premises dedicated storage node 106 _(m)(m an integer, where 1≤m≤M) is a networked computing device and comprises one or more storagedevices and associated circuitry for making the storage device(s)accessible via the LAN 102. An example configuration of a dedicatedstorage node 106 _(m) is described below with reference to FIG. 4 .

Each storage service 114 _(k) (k an integer, where 1 ≤k≤K) may be acloud-based service such as Amazon S3, Microsoft Azure, Google Cloud,Rackspace, Amazon Glacier, and Google Nearline.

Each remote dedicated storage node 115 ₁(1 an integer, where 1≤1≤L) maybe similar to, or the same as, an on-premises dedicated storage node106. In an example implementation, a remote dedicated storage node 115 ₁may store data in a different format and/or be accessed using differentprotocols than an on-premises dedicated storage node 106 (e.g., HTTP asopposed to Ethernet-based or RDMA-based protocols).

FIG. 2 illustrates various example configurations of a compute node thatuses a DESS in accordance with aspects of this disclosure. The examplecompute node 104 _(n) comprises hardware 202 that, in turn, comprises aprocessor chipset 204 and a network adaptor 208.

The processor chipset 204 may comprise, for example, an ×86-basedchipset comprising a single or multi-core processor system on chip, oneor more RAM ICs, and a platform controller hub IC. The chipset 204 maycomprise one or more bus adaptors of various types for connecting toother components of hardware 202 (e.g., PCIe, USB, SATA, and/or thelike).

The network adaptor 208 may, for example, comprise circuitry forinterfacing to an Ethernet-based and/or RDMA-based network. In anexample implementation, the network adaptor 208 may comprise a processor(e.g., an ARM-based processor) and one or more of the illustratedsoftware components may run on that processor. The network adaptor 208interfaces with other members of the LAN 100 via (wired, wireless, oroptical) link 226. In an example implementation, the network adaptor 208may be integrated with the chipset 204.

Software running on the hardware 202 of compute node 104 _(n) includesat least: an operating system and/or hypervisor 212, one or more clientprocesses 218 (indexed by integers from 1 to Q, for Q≥1) and one or bothof: a DESS driver 221 and DESS front end 220. Additional software thatmay optionally run on the compute node 104 _(n) includes: one or morevirtual machines (VMs) and/or containers 216 (indexed by integers from 1to R, for R≥1).

Each client process 218 _(q) (q an integer, where 1≤q≤Q) may rundirectly on an operating system/hypervisor 212 or may run in a virtualmachine and/or container 216 _(r) (r an integer, where 1≤r≤R) servicedby the OS and/or hypervisor 212.

The DESS driver 221 is operable to receive/intercept local file systemcommands (e.g., POSIX commands) and generate corresponding file systemrequests (e.g., read, write, create, make directory, remove, removedirectory, link, etc.) to be transmitted onto the interconnect 101. Insome instances, the file system requests transmitted on the interconnect101 may be of a format customized for use with the DESS front end 220and/or DESS back end 222 described herein. In some instances, the filesystem requests transmitted on the interconnect 101 may adhere to astandard such as Network File System (NFS), Server Message Block (DMB),Common Internet File System (CIFS), and/or the like.

Each DESS front end instance 220 _(s) (s an integer, where 1≤s≤S if atleast one front end instance is present on compute node 104 _(n))provides an interface for routing file system requests to an appropriateDESS back end instance (running on a DESS node), where the file systemrequests may originate from one or more of the client processes 218, oneor more of the VMs and/or containers 216, and/or the OS and/orhypervisor 212. Each DESS front end instance 220 _(s) may run on theprocessor of chipset 204 or on the processor of the network adaptor 208.For a multi-core processor of chipset 204, different instances of theDESS front end 220 may run on different processing cores.

FIG. 3 shows various example configurations of a distributed electronicstorage system node in accordance with aspects of this disclosure. Theexample DESS node 120 _(j) comprises hardware 302 that, in turn,comprises a processor chipset 304, a network adaptor 308, and,optionally, one or more storage devices 306 (indexed by integers from 1to W, for W≥1).

Each storage device 306 _(p) (p an integer, where 1≤p≤P if at least onestorage device is present) may comprise any suitable storage device forrealizing a tier of storage that it is desired to realize within theDESS node 120 _(j).

The processor chipset 304 may be similar to the chipset 204 describedabove with reference to FIG. 2 . The network adaptor 308 may be similarto the network adaptor 208 described above with reference to FIG. 2 andmay interface with other nodes of LAN 100 via link 326.

Software running on the hardware 302 includes at least: an operatingsystem and/or hypervisor 212, and at least one of: one or more instancesof DESS front end 220 (indexed by integers from 1 to W, for W≥1), one ormore instances of DESS back end 222 (indexed by integers from 1 to X,for X≥1), and one or more instances of DESS memory controller 224(indexed by integers from 1 to Y, for Y≥1). Additional software that mayoptionally run on the hardware 302 includes: one or more virtualmachines (VMs) and/or containers 216 (indexed by integers from 1 to R,for R≥1), and/or one or more client processes 318 (indexed by integersfrom 1 to Q, for Q≥1). As mentioned above, DESS processes and clientprocesses may share resources on a DESS node.

The client processes 218 and VM(s) and/or container(s) 216 are asdescribed above with reference to FIG. 2 .

Each DESS front end instance 220, (w an integer, where 1≤w≤W, if atleast one front end instance is present on DESS node 120 _(j)) providesan interface for routing file system requests to an appropriate DESSback end instance (running on the same or a different DESS node), wherethe file system requests may originate from one or more of the clientprocesses 218, one or more of the VMs and/or containers 216, and/or theOS and/or hypervisor 212. Each DESS front end instance 220 _(w), may runon the processor of chipset 304 or on the processor of the networkadaptor 308. For a multi-core processor of chipset 304, differentinstances of the DESS front end 220 may run on different processingcores.

Each DESS back end instance 222 _(x) (x an integer, where 1≤x≤X, if atleast one back end instance is present on DESS node 120 _(j)) servicesthe file system requests that it receives and carries out tasks tootherwise manage the DESS (e.g., load balancing, journaling, maintainingmetadata, caching, moving of data between tiers, removing stale data,correcting corrupted data, etc.) Each DESS back end instance 222 _(x)may run on the processor of chipset 304 or on the processor of thenetwork adaptor 308. For a multi-core processor of chipset 304,different instances of the DESS back end 222 may run on differentprocessing cores.

Each DESS memory controller instance 224 _(u), (u an integer, where1≤u≤U, if at least DESS memory controller instance is present on DESSnode 120 _(j)) handles interactions with a respective storage device 306(which may reside in the DESS node 120 j or another DESS node 120 or astorage node 106). This may include, for example, translating addresses,and generating the commands that are issued to the storage device (e.g.,on a SATA, PCIe, or other suitable bus). Thus, the DESS memorycontroller instance 224 _(u), operates as an intermediary between astorage device and the various DESS back end instances of the DESS.

FIG. 4 illustrates various example configurations of a dedicated storagenode in accordance with aspects of this disclosure. The examplededicated storage node 106 _(m) comprises hardware 402 which, in turn,comprises a network adaptor 408 and at least one storage device 306(indexed by integers from 1 to Z, for Z≥1). Each storage device 306 _(z)may be the same as storage device 306 _(w), described above withreference to FIG. 3 . The network adaptor 408 may comprise circuitry(e.g., an ARM-based processor) and a bus (e.g., SATA, PCIe, or other)adaptor operable to access (read, write, etc.) storage device(s) 406₁-406 _(Z) in response to commands received over network link 426. Thecommands may adhere to a standard protocol. For example, the dedicatedstorage node 106 _(m) may support RDMA based protocols (e.g.,Infiniband, RoCE, iWARP etc.) and/or protocols which ride on RDMA (e.g.,NVMe over fabrics).

In an example implementation, tier 1 memory is distributed across one ormore storage devices 306 (e.g., FLASH devices) residing in one or morestorage node(s) 106 and/or one or more DESS node(s) 120. Data written tothe DESS is initially stored to Tier 1 memory, and then migrated to oneor more other tier(s) as dictated by data migration policies, which maybe user-defined and/or adaptive based on machine learning.

FIG. 5A illustrates a first example implementation of a node configuredfor congestion mitigation in accordance with aspects of this disclosure.The example DESS node 120 ₁ in FIG. 5A is configured to implement aclient process 218, a file system request buffer 504, an instance ofDESS front end 220, an instance of DESS backend 222, a storage device306 comprising a buffer 502, and one or more file system chokingprocess(es) 506.

The client process 218 may be as described above with reference to FIGS.1-4 . The client process 218 submits file system requests to the DESSand those file system requests are buffered in file system requestbuffer 504.

The file system request buffer 504 may, for example, reside in memory ofthe chipset 204 (FIG. 2 ). In the example implementation shown, the node102 ₁ comprises only a single buffer 504. In an example implementationin which the DESS comprises a plurality of distributed file systemswhich are mounted on the node 120 ₁, the node 120 ₁ may comprise aplurality of buffers 504—one for each of the mounted file systems.

The buffer 502 may, for example, comprise RAM within the storage device306 _(v). The buffer 502 is used for buffering data being read fromand/or written to nonvolatile storage (e.g., FLASH) of the storagedevice 306.

The file system choking process(es) 506 control the rate at which thefile system requests in the buffer 504 are fetched by the front end 220so as to manage congestion in (and, thus, quality of service providedby) the DESS.

In operation, the front end fetches batches of file system requests fromthe buffer 504, determines which back end instance(s) 222 should servicethe request(s), generates the appropriate DESS message(s) for conveyingthe request(s) to the back end(s) 222, and transmits DESS message(s) tothe determined back end(s) 222 via the network 102. The back end(s) 222receive the DESS message(s) and perform the necessary operations tocarry out the file system request (typically involving reading and/orwriting data and/or metadata from/to one or more storage device(s) 306).The rate at which the file system requests are fetched from the buffer504 is controlled by the choking process(es) 506. In an exampleimplementation (further described below with reference to FIGS. 9A-9D),this comprises the choking process(es) 506 determining a choking leveland then adjusting one or more settings based on the determined chokinglevel. The one or more settings may comprise, for example: a batchtiming setting (i.e., the timing of when file system requests arefetched from the buffer 504), and a batch size setting (i.e., how filesystem requests are fetched from the buffer 504 at a time). The batchtiming setting may, for example, be an interval duration and/or anoffset relative to some reference time.

The control of the rate at which file system requests are fetched may bebased on information about the state of the DESS. The state informationmay be based on the load on (i.e., level of usage of) resources of theDESS. The load may be a most-recently measured/recorded load or may be apredicted load based on historical measurement/recordings (for the sameDESS and/or other DESSs) being input to a prediction algorithm Suchresources may include resources of the node 120 ₁ (DESS resources“local” to node 120 ₁). Such resources may also include similarresources of other nodes 104, 120 _(j), and/or 106 of the DESS (DESSresources that are “remote” from the perspective of node 120 ₁).Information about the loads on remote resources may be determined fromDESS messages received from other nodes of the DESS. Similarly, the node120 ₁ may transmit DESS messages which indicate the loads on itsresources. Such DESS messages may contain a direct representation ofload on one or more resources and/or may contain values calculated basedon the load no one or more resources. This bidirectional exchange ofchoking information gives choking processes 506 throughout the DESS amore holistic view of the state of the DESS, which enables them to moreoptimally control the rate at which they submit file system requests tothe DESS as compared to if they had to control the rate based only ontheir respective local resource loads.

Resources for which resource load may be monitored include one or moreof the following: storage device, CPU, network, and memory. A load on astorage device may, for example, be represented by a single valuedetermined from depth of buffer 502, or represented by two values wherethe first is determined from depth of read buffer 710 and the second isdetermined from depth of write buffer 712. A load on a CPU may, forexample, be represented by a value corresponding to a running average ofpercentage of available cycles per second being used. A load on anetwork adaptor or link may, for example, be represented by a singlevalue determined from depth of transmit and/or receive buffers, orrepresented by two values where the first is determined from depth of atransmit buffer and the second is determined from depth of a receivebuffer. A load on a memory may, for example, be represented by a singlevalue determined from the amount of used (or free) memory.

Details of example operation of the implementation of FIG. 5A will nowbe described with reference to the flowchart of FIG. 5B.

The process of FIG. 5B begins with block 552 in which the DESS beginsits startup/initialization process (e.g., after power up or reset of thenode(s) across which it is distributed).

In block 554, various resources (e.g., CPU(s), memory, networkadaptor(s), and storage device(s)) of the DESS are characterized. Forexample, a choking process 506 on each node of the DESS may determine(e.g., through one or more commands supported by the node's operatingsystem) the identity (e.g., manufacturer, model number, serial number,and/or the like) of local resources, and use those identities toretrieve corresponding characteristics from a resource characteristicsdatabase (e.g., stored locally in the network 102 and/or accessible viathe Internet). For a resource such as a CPU, such characteristics mayinclude, for example, clock speed, cache size, cache speed, number ofcores, and/or the like. For a resource such as memory, suchcharacteristics may include, for example, size of memory, speed ofmemory, and/or the like. For a network adaptor such characteristics mayinclude, for example, latency, maximum throughput, buffer size, and/orthe like. For a resource such as a storage device such characteristicsmay include, for example, size of its buffer 502, write speed (e.g., ininput/output operations per second (IOPS)) as a function of the depth(i.e., fill level) of its buffer 502, read speed as a function of thedepth of its buffer 502, and/or the like. In instances that a record isnot found in the database for an identified resource, a choking process506 may perform a characterization of the resource before proceeding toblock 556. As an example, test reads and/or writes may be issued to astorage device 306 and the resulting read and/or write speed as afunction of the depth of its buffer 502 may be monitored and then usedto generate a characterization which is then stored to the database.

In block 555, one or more settings used by the choking process(es) 506are configured based on the resource characteristics determined in block554. As an example, one or more functions may be used for mappingresource load values to congestion contribution values, mappingcongestion contribution values to a choking level, and mapping a chokinglevel to values for a batch timing setting and a batch size setting.Such function(s) may have one or more parameters which may be set basedon the characteristics determined in block 554.

In block 556, each node of the DESS determines its initial chokingsettings (e.g., initial batch timing and batch size settings). Theinitial choking settings may, for example, be set empirically by a DESSadministrator and/or may be set automatically by the choking process 506based on historical settings used in this DESS and/or other DESSs (e.g.,as adapted by a learning algorithm)

In block 557, the DESS is ready to begin servicing file system requests.

In block 558, a front end 220 of a DESS node 120 _(j) (Note: the node120 _(j) may be a different node on different iterations through theloop comprising blocks 558-566) fetches file system request(s) from itsbuffer 504 based on its choking settings (e.g., values of batch timingand batch size), and generates one or more corresponding DESS message(s)(e.g., message(s) to convey the file system requests to the appropriateback end(s) 222).

In block 560, a choking process 506 of the node 120 _(j) inserts chokinginformation into the DESS message(s).

In block 562, the node 120 _(j) transmits the DESS message(s) into thenetwork 102.

In block 564, other node(s) of the DESS receive the DESS message(s) andextract(s) the choking information.

In block 566, the other node(s) update their choking settings based onthe choking information from node 120 _(j) and based on theirmost-recent load information for other resources.

FIG. 6 illustrates another example implementation of a node configuredfor congestion mitigation in accordance with aspects of this disclosure.FIG. 6 is largely the same as FIG. 5A except the node 120 _(j) in FIG. 6manages different types of file system requests separately.Specifically, file system requests which require reading and/or writingdata to/from the distributed file system are managed separately fromfile system requests which require reading and/or writing metadatato/from the distributed file system. The separate management may berealized, for example, using two separate FIFO buffers 602 and 604 asshown, but may also be realized in other ways such as using a singlerandom access buffer.

In the example implementation shown, the node 102 _(j) comprises only asingle buffer 602 and a single buffer 604. In an example implementationin which the DESS comprises a plurality of distributed file systemswhich are mounted on the node 120 _(j), the node 120 _(j) may comprise aplurality of buffers 602 (one for each file system of the DESS mountedon node 120 _(j)) and a plurality of buffers 604 (one for each filesystem of the DESS mounted on node 120 _(j)).

Operation of the example node 120 _(j) of FIG. 6 is similar to asdescribed with reference to FIG. 5A, with the rate at which requests arefetched from buffer 602 being controlled separately from rate at whichrequests are fetched from buffer 604. For example, choking process(es)506 of node 120 _(j) may control the rate at which file system datarequests are fetched from buffer 602 by controlling a data batch timingsetting (T_(D)) and a data batch size setting (S_(D)), and may controlthe rate at which file system metadata requests are fetched from buffer604 by controlling a metadata batch timing setting (T_(M)) and ametadata batch size setting (S_(M)). The ability to separately controlthe rate of file system data requests and file system metadata requestsis advantageous at least because, in many cases, file system metadatarequests are more important than file system data requests because filesystem metadata requests enable, for example: querying the status of theDESS; making some changes so to optimize in-process file systemoperations. Further, metadata requests are often run by interactive“human generated” sessions, so getting them to execute quicker resultsin a higher level of user satisfaction. Accordingly, in some instanceswhen the DESS is getting congested, the choking process(es) 506 mayreduce the rate at which requests are fetched from buffer 602 soonerand/or more aggressively than the rate at which requests are fetchedfrom buffer 604. In some instances this may lead to a scenario in whichfile system metadata requests, but not file system data requests, arefetched during a determined time interval.

FIG. 7 illustrates another example implementation of a node configuredfor congestion mitigation in accordance with aspects of this disclosure.FIG. 7 is largely the same as FIG. 6 except, in FIG. 7 , the separatemanagement is of file system read requests and file system writerequests, rather than of file system data requests and file systemmetadata requests. The separate management may be realized, for example,using two separate FIFO buffers 702 and 704 as shown, but may also berealized in other ways such as using a single random access buffer.

In the example implementation shown, the node 102 _(i) comprises only asingle buffer 702 and a single buffer 704. In an example implementationin which the DESS comprises a plurality of distributed file systemswhich are mounted on the node 120 ₁, the node 120 ₁ may comprise aplurality of buffers 702 (one for each file system of the DESS mountedon node 120 _(j)) and a plurality of buffers 704 (one for each filesystem of the DESS mounted on node 120 _(j)).

Operation of the example node 120 _(j) of FIG. 7 is similar to asdescribed with reference to FIG. 6 , with the rate at which requests arefetched from buffer 702 being controlled separately the rate at whichrequests are fetched from buffer 704. For example, choking process(es)506 of node 120 _(j) may control the rate at which file system datawrite requests are fetched from buffer 702 by separately controlling awrite timing setting (T_(W)), a write batch size setting (S_(W)), a readtiming setting (T_(R)), a read batch size setting (S_(R)), metadatabatch timing setting (T_(M)), and a metadata batch size setting (S_(M)).The ability to separately control the rate of file system read requestsand file system write requests is advantageous at least because, forexample, write operations and read operations may use differentresources which may become congested at different rates. For example, itmay occur at some particular time that there are many read operationspending and thus buffer 710 of storage device 306 cannot accept any moreread requests, but buffer 712 has capacity to accept write requests (andresources of the storage device 306 are available to begin working onsuch write requests). Without separate management of file system readrequests and file system write requests, write requests in the buffer504 (FIG. 5 ) may be blocked by read requests waiting for resources instorage device 306 to free up. Similarly, it may occur at someparticular time that there are many write operations pending and thusbuffer 712 of storage device 306 cannot accept any more write requests,but buffer 710 has capacity to accept read requests (and resources ofthe storage device 306 are available to begin working on such readrequests). Without separate management of file system read requests andfile system write requests, read requests in the buffer 504 (FIG. 5 )may be blocked by read requests waiting for resources in storage device306 to free up. The implementation of FIG. 7 avoids this problem andpermits the DESS to begin working on one or more pending write requests.

Now referring to FIGS. 8A-8C, there is shown three example nodes 802₁-802 ₃ of a DESS. Each node 802 _(X) (X between 1 and 3 in the exampleof FIGS. 8A-8C) may be a compute node 102, a DESS node 120, or a storagenode 106. Each node 802 _(X) comprises a data structure 850 _(X) (e.g.,in which is stored resource load values for resources, both local andremote, of the DESS. For purposes of illustration (and not limitation),the example of FIGS. 8A-8C assumes four resources are monitored: CPUresources, memory resources, storage (solid state drive (SSD) in theexample, but could be any type of non-volatile storage) resources, andnetwork resources. For purposes of illustration (and not limitation),each resource load value can range between 0 and 7. The load values may,for example, correspond to percentage of maximum load supported by theresource. The values may, for example, scale linearly (e.g., 25%, 37.5%,50%, 62.5%, 75%, 87.5%, and 100%), nonlinearly (e.g., 20%, 40%, 60%,75%, 85%, 90%, 95%, 100%, or in any other desired manner Therelationship between the values representing the load and the actualload (e.g., as a percentage of maximum capacity) may be configurable bya system administrator and/or may adapt during operation of the DESS(e.g., based on log files generated by the DESS logger/monitor, asdescribed with reference to FIGS. 9-10 ).

For purposes of illustration (and not limitation), each data structure850 _(X) is shown as a table where each row corresponds to a respectiveone of nodes 802 ₁-802 ₃, and each column corresponds to one of themonitored resources. Thus, for example, a ‘4’ in row 1, column 1indicates a load of ‘4’ for CPU resources of node 802 ₁, and a ‘2’ inrow 2, column 3 corresponds to a load of ‘2’ for a storage device ofnode 802 ₂.

FIG. 8A shows the nodes 802 ₁-802 ₃ upon initialization of the DESS. Inthis example, in order for the DESS to begin servicing file systemrequests without having to wait for the loads on the various resourcesto be determined (which may be difficult to do without processing actualfiles system requests, or at least “dummy” file system requests), eachnode 802 _(X) initializes its resource load table 850 to default values.For simplicity of illustration, it is assumed that ‘4’ is the defaultvalue for all resources. In practice, different resources may default todifferent values and/or different nodes may use different defaultvalues. The default values may be configured by a network administratorand/or may adapt based on previous operation of the DESS (e.g., based onlog files generated by the DESS logger/monitor, as described withreference to FIGS. 9-10 ).

Now referring to FIG. 8B, again shown are the nodes of FIG. 8A, but at alater time than in FIG. 8A. In FIG. 8B, each node 802 _(X) hasdetermined the load its local resources. Thus, each node 802 _(X) hasupdated row X of its data structure 850 _(X). In an exampleimplementation, determination of the load on local resources maycomprise file system choking process(es) of the nodes 802 _(X) queryingtheir respective OSs/hypervisors.

Now referring to FIG. 8C, again shown are the nodes of FIG. 8B but at alater time than FIG. 8B. In FIG. 8C, when node 802 ₁ has a file systemmessage 852 to transmit into the network 102. The 802 ₁ inserts resourceload information into the message. The node 802 ₁ may, for example,insert row 1 of table 850 ₁ into the message 852 or may, for example,insert its whole table 850 ₁ into the message 852 (thus propagatingother updates it may have received). The file system message 852 may,for example, be a message delivering a file system request received by aDESS front end instance 220 (see e.g., FIGS. 5A and 5B) on node 802 ₁ tothe back end instance 222 (see e.g., FIGS. 5A and 5B) that will servicethe request. In the example shown, the message 852 is sent as abroadcast message such that each other node 802 _(X) receives themessage and updates row 1 of its table 850 _(X). In another exampleimplementation, the message 852 may be a unicast message but nodes towhich the message is not destined may nevertheless “sniff” the messageto extract a field of the message used for carrying resource loadinformation. In another example implementation, the message 852 may be aunicast message and the resource load information carried therein mayonly be received by the device to which the message is destined. In suchan implementation, only the one of nodes 802 ₂ and 802 ₃ to which themessage 852 is destined will update row 1 of its table 850 based on thismessage. The other one of the nodes 802 ₂ and 802 ₃ will have to waitfor a file system message directed to it (from either 802 ₁ or the othernode which has received the update) in order to receive the updated loadvalues for node 802 ₁. In practice, with many file system messages beingexchanged frequently, updates may propagate through the DESS quicklysuch that any particular table 850 _(X) will typically not contain manystale values and/or not contain a stale value for very long.

In the example just described, resource load values are “piggybacked” onfile system messages which would be sent anyways (even if there was noresource load information to convey). This reduces resources requiredfor communicating the resource load information. In some instances,however, file system messages may be generated and sent for the solepurpose of updating resource load values. For example, dedicatedresource load value messages may be generated upon initialization of theDESS so as to reduce the reliance on default values. As another example,a dedicated resource load value message may be generated and sent inresponse to detecting that one or more load values have not been updatedin more than a threshold amount of time (e.g., row 3 of table 850 ₁ maybe determined stale if it has not been updated for more than thethreshold amount of time, and a dedicated message carrying amore-recently updated row 3 from either table 850 ₂ or 850 ₃ may becommunicated to the node 802 ₁).

FIG. 9 illustrates aspects of a DESS configured to perform resource loadmonitoring. The node 120 _(j) shown in FIG. 9 is similar to the node 120j shown in FIGS. 5A, 6, and 7 with the FS request buffer(s) 902providing any desired buffering scheme such as any of the bufferingschemes described above with reference to FIGS. 5A, 6, and 7 . In FIG. 9one or more DESS nodes are configured to implement a database 904, aDESS provisioner 906, a DESS administrator 908, and a DESSmonitor/logger 910.

The DESS monitor/logger 910 is operable to monitor performance of theDESS. This may comprise, for example, processing file system messagesbeing communicated among the nodes of the DESS to extract informationindicative of the load on DESS resources and/or performance of the DESS.For example, the monitor/logger 910 may extract resource load values,such as those described above with reference to FIGS. 8A-8C, from thefile system messages and log these values (and/or metadata associatedwith the values) to the database 904. As another example, themonitor/logger 910 may track how long file system requests (e.g.,particular requests and/or on average) are taking to complete and logthis information to the database. In an example implementation, the DESScomprises a single DESS monitor/logger 910 which monitors all resourcesof the DESS (e.g., by “sniffing” all file system messages communicatedamong the nodes of the DESS). Such an implementation is shown in FIG. 9. In FIG. 9 DESS monitor/logger 910 resides in a node other than 120_(j) for clarity of illustration, but it could have been shown asresiding in the node 120 _(j). In another example implementation, eachnode of the DESS may comprise its own DESS monitor/logger 910responsible for monitoring local resources of that node and logging theinformation to the database 904.

The DESS provisioner 906 is operable to provision (and deprovision)resources of one or more computing devices for use by the DESS.Provisioning and deprovisioning of resources may be performed based onthe current and/or predicted load on resources provisioned for use bythe DESS. The current load on provisioned resources may be determinedfrom, for example, output (e.g., log files) of the DESS monitor/logger910. The predicted load may, for example, be generated using a learningalgorithm and historical resource load data (i.e., log files frommonitor/logger 910). In an example implementation, the DESS comprises asingle/centralized DESS provisioner 906 operable to perform provisioningand deprovisioning of resources across the entire DESS. Such animplementation is shown in FIG. 9 . In FIG. 9 DESS provisioner 906resides in a node other than 120 _(j) for clarity of illustration, butit could have been shown as residing in the node 120 _(j). Alternatively(or additionally), each node of the DESS may comprise its own DESSprovisioner 910 operable to perform provisioning and deprovisioning oflocal resources.

The DESS administrator 908 is operable to provide a user interface viawhich a DESS administrator can monitor the DESS. For example, the DESSadministrator 908 may provide an application programming interface viawhich some data about the state of the DESS (e.g., log files and/or realtime output generated by the DESS monitor/logger 910 and/or the DESSprovisioner 906) can be retrieved/visualized/etc. In FIG. 9 DESSprovisioner 906 resides in a node other than 120 _(j) for clarity ofillustration, but it could have been shown as residing in the node 120_(j). Alternatively (or additionally), each node of the DESS maycomprise its own DESS administrator 908 via which an administrator caninteract with the DESS.

The database 904 may reside on storage devices 306 of the DESS and maystore data generated by monitor/logger 910 (e.g., log files), datagenerated by provisioning agent 906 (e.g., reports of provisioning andde-provisioning actions undertaken), and data generated by the DESSadministrator 908 (e.g., log files).

FIG. 10 is a flowchart illustrating an example process of resource loadlogging and reporting in a DESS. The process of FIG. 10 may be performedcontinually, periodically, and/or occasionally (e.g., triggered by theoccurrence of monitored-for events).

In block 1002, one or more choking processes 506 generate dataindicating a load on, and/or performance of, resources provisioned foruse by the DESS. In block 1004, the data generated in block 1004 isobtained by the DESS monitor/logger 910 (e.g., reported to the DESSmonitor/logger 910 and/or “sniffed” as it is communicated across thenetwork 102). In block 1006, the DESS monitor/logger 910 logs the data.In some instances, the DESS monitor/logger 910 compares the data againstdetermined criteria to decide whether an alert should be generated. Thealert may be, for example, an email or SMS message. Some examples ofsuch criteria comprise: load on a particular resource being above athreshold, load on a particular resource being below a threshold, a DESSperformance metric (e.g., file system IOPS) being below a threshold, aDESS performance metric (e.g., IOPS) being above a threshold, and/or thelike.

FIG. 11 is a flowchart illustrating an example process for automaticprovisioning and de-provisioning of DESS resources. The process beginswith startup of the DESS in block 1102.

In block 1104, nodes of the DESS determine and exchange various loadand/or performance data (e.g., resource load values, measuredperformance metrics, and/or the like). After block 1104, the processadvances to block 1106.

In block 1106, the load and/or performance data is analyzed by DESSmonitor/logger 910. After block 1106, the process advances to block1108.

In block 1108, the DESS monitor/logger 910 determines whether anoverloaded condition and/or an underperformance condition is present.The conditions tested for may adapt over time (e.g., based on log filesgenerated by DESS monitor/logger 910). An example overloaded conditionis a load on one or more resources provisioned for use by the DESS beinggreater than a determined threshold. An example underperformancecondition is file system IOPS being below a determined threshold. Ifnot, then the process advances to block 1110.

In block 1110, the DESS monitor/logger 910 determines whether anunderloaded condition and/or an overperformance condition is present.The conditions tested for may adapt over time (e.g., based on log filesgenerated by DESS monitor/logger 910). An example underloaded conditionis a load on one or more resources provisioned for use by the DESS beingless than a determined threshold. An example overperformance conditionis file system IOPS being above a determined threshold. If not, then theprocess advances to block 1104.

Returning to block 1110, if an underloaded or overperformance conditionis present, the process advances to block 1112.

In block 1112, the DESS monitor/logger 910 alerts DESS provisioner 906as to the presence of the underloaded and/or overperformance condition.

In block 1114, DESS provisioner 906 determines which resources are nolonger needed (e.g., by inspecting a log file generated by DESSmonitor/logger 910), and de-provisions the unnecessary resources. As anexample, if fewer CPU resources can be tolerated, the provisioner 906may deallocate a CPU core such that the CPU core can be reallocated toperforming other non-DESS functions. As another example, if less networkbandwidth can be tolerated, the provisioner 906 may reduce the priorityof DESS traffic in a network adaptor. The type(s) and/or amount ofresources deprovisioned in block 1114 may, for example, be determined bythe conditions present in block 1110 (e.g., based on the type ofresource being underloaded, the amount by which load on the resource isbelow a threshold, and/or the like). The type(s) and/or amount ofresources deprovisioned in block 1114 may, for example, be determinedusing machine learning algorithms based on the logs generated bylogger/monitor 910.

Returning to block 1108, if an overloaded and/or underpeformancecondition is present, the process advances to block 1116.

In block 1116, the DESS monitor/logger 910 alerts DESS provisioner 906as to the presence of the overloaded and/or underperformance condition.

In block 1118, DESS provisioner 906 determines which additionalresources are needed (e.g., by inspecting a log file generated by DESSmonitor/logger 910), and provisions the necessary resources (ifavailable). If the necessary resources are not available, anadministrator may be alerted (e.g., via email, SMS, and/or the like). Asan example, if CPU resources are being overburdened, the provisioner 906may allocate an additional CPU core to performing DESS functions. Asanother example, if more network bandwidth is needed, the provisioner906 may increase the priority of DESS traffic in a network adaptor. Thetype(s) and/or amount of resources provisioned in block 1118 may, forexample, be determined by the conditions present in block 1108 (e.g.,based on the type of resource being overburdened, the amount by whichload on the resource is above a threshold, and/or the like). The type(s)and/or amount of resources provisioned in block 1118 may, for example,be determined using machine learning algorithms based on the logsgenerated by logger/monitor 910.

FIG. 12 is a block diagram illustrating configuration of a DESS from anon-transitory machine-readable storage media. Shown in FIG. 12 isnon-transitory storage 1202 on which resides code 1203. The code is madeavailable to computing devices 1204 and 1206 (which may be computenodes, DESS nodes, and/or dedicated storage nodes such as thosediscussed above) as indicated by arrows 1210 and 1212. For example,storage 1202 may comprise one or more electronically addressed and/ormechanically addressed storage devices residing on one or more serversaccessible via the Internet and the code 1203 may be downloaded to thedevices 1004 and 1006. As another example, storage 1202 may be anoptical disk or FLASH-based disk which can be connected to the computingdevices 1204 and 1206 (e.g., via USB, SATA, PCIe, and/or the like).

When executed by a computing device such as 1204 and 1206, the code 1203may install and/or initialize one or more of the DESS driver, DESSfront-end, DESS back-end, DESS memory controller on the computingdevice. This may comprise copying some or all of the code 1203 intolocal storage and/or memory of the computing device(s) 1204 and/or 1206and beginning to execute the code 1203 (launching one or more DESSprocesses) by one or more processors of the computing device(s) 1204and/or 1206. Which of code corresponding to the DESS driver, codecorresponding to the DESS front-end, code corresponding to the DESSback-end, and/or code corresponding to the DESS memory controller iscopied to local storage and/or memory of the computing device(s) 1204and/or 1206 and is executed by the computing device(s) 1204 and/or 1206may be configured by a user during execution of the code 1203 and/or byselecting which portion(s) of the code 1203 to copy and/or launch. Inthe example shown, execution of the code 1203 by the device 1204 hasresulted in one or more client processes and one or more DESS processesbeing launched on the processor chipset 1214. That is, resources(processor cycles, memory, etc.) of the processor chipset 1214 areshared among the client processes and the DESS processes. On the otherhand, execution of the code 1203 by the device 1206 has resulted in oneor more DESS processes launching on the processor chipset 1216 and oneor more client processes launching on the processor chipset 1218. Inthis mariner, the client processes do not have to share resources of theprocessor chipset 1216 with the DESS process(es). The processor chipset1218 may comprise, for example, a process of a network adaptor of thedevice 1206.

In accordance with an example implementation of this disclosure, adistributed electronic storage system (DESS) comprises a plurality ofcomputing devices (e.g., 120 ₁-120 _(J)) communicatively coupled via oneor more network links and having a file system distributed among them.The DESS comprises management circuitry (e.g., circuitry configured toimplement one or more of: choking process(es) 506, DESS provisioner,DESS administrator, and DESS monitor/logger) that resides on the firstcomputing device. The management circuitry is operable to generate anindication of a load on a first resource that resides on the firstcomputing device. The management circuitry is operable to receive, viathe one or more network links, an indication of a load on a secondresource that resides on a second computing device of the plurality ofcomputing devices. The management circuitry is operable to determine acondition of the DESS based on the indication of the load on the firstresource and the indication of the load on the second resource. Themanagement circuitry may be operable to, subsequent to generation of theindication of the load on the first resource, append the indication ofthe load on the first resource to an outgoing file system message (e.g.,852), and transmit the file system message on to the one or more networklinks. The management circuitry may be operable to, in response to thecondition of the DESS being an overloaded condition, perform automaticprovisioning of additional resources of the first computing device foruse by the DESS. The automatic provisioning of additional resources maycomprise one or more of: automatic provisioning of an additionalprocessing core for use by the DESS; automatic provisioning ofadditional memory for use by the DESS; automatic provisioning ofadditional network bandwidth for use by the DESS; and automaticprovisioning of additional nonvolatile storage for use by the DESS. Themanagement circuitry may be operable to, in response to the condition ofthe DESS being an underloaded condition, perform automaticdeprovisioning of resources of the first computing device which werepreviously provisioned for use by the DESS. The automatic deprovisioningof resources may comprise one or more of: automatic deprovisioning of aprocessing core previously provisioned for use by the DESS; automaticdeprovisioning of memory previously provisioned for use by the DESS;automatic deprovisioning of network bandwidth previously provisioned foruse by the DESS; and automatic deprovisioning of nonvolatile storagepreviously provisioned for use by the DESS. The indication of the loadon the first resource and the indication of the load on the secondresource may comprise one of: an indication of a load on a network link;an indication of a load on a processing core; an indication of a load onmemory; and an indication of a load on a storage device. Each of theindication of the load on the first resource and the indication of theload on the second resource may comprise one or both of: an indicationof a number of write operations pending for the file system; and anindication of a number of read operations pending for the file system.The first resource may be a storage device (e.g., 306) and theindication of the load on the first resource is based on a depth of abuffer of the storage device. One or more file system request buffers(e.g., 902) may reside on the first computing device. The managementcircuitry may be operable to control a rate at which file systemrequests stored in the one or more file system buffers are serviced. Thecontrol of the rate may be based on the determined condition of theDESS. The control of the rate may comprise control of one or both of: aninterval at which batches of file system requests are fetched from theone or more buffers; and a size of each of the batches of file systemrequests. The control of the rate may comprise separate control of: arate at which file system data requests stored in the one or morebuffers are serviced; and a rate at which file system metadata requestsstored in the one or more buffers are serviced. The control of the ratemay comprise separate control of: a rate at which file system data readrequests stored in the one or more buffers are serviced; and a rate atwhich file system data write requests stored in the one or more buffersare serviced. The determination of the status of the DESS may comprisecalculation of a choking level which determines a rate at which requestsof the file system are serviced. The management circuitry may beoperable to generate an indication of a performance of the DESS, anddetermine a condition of the DESS based on the indication of theperformance of the DESS. The management circuitry may be operable to, inresponse to the condition of the DESS being an underperformancecondition, perform automatic provisioning of additional resources of thefirst computing device for use by the DESS. The management circuitry maybe operable to, in response to the condition of the DESS being anoverperformance condition, perform automatic deprovisioning of resourcesof the first computing device which were previously provisioned for useby the DESS.

Thus, the present methods and systems may be realized in hardware,software, or a combination of hardware and software. The present methodsand/or systems may be realized in a centralized fashion in at least onecomputing system, or in a distributed fashion where different elementsare spread across several interconnected computing systems. Any kind ofcomputing system or other apparatus adapted for carrying out the methodsdescribed herein is suited. A typical combination of hardware andsoftware may be a general-purpose computing system with a program orother code that, when being loaded and executed, controls the computingsystem such that it carries out the methods described herein. Anothertypical implementation may comprise an application specific integratedcircuit or chip. Some implementations may comprise a non-transitorymachine-readable storage medium (e.g., FLASH drive(s), optical disk(s),magnetic storage disk(s), and/or the like) having stored thereon one ormore lines of code executable by a computing device, thereby configuringthe machine to be configured to implement one or more aspects of themethods and systems described herein.

While the present method and/or system has been described with referenceto certain implementations, it will be understood by those skilled inthe art that various changes may be made and equivalents may besubstituted without departing from the scope of the present methodand/or system. In addition, many modifications may be made to adapt aparticular situation or material to the teachings of the presentdisclosure without departing from its scope. Therefore, it is intendedthat the present method and/or system not be limited to the particularimplementations disclosed, but that the present method and/or systemwill include all implementations falling within the scope of theappended claims.

As utilized herein the terms “circuits” and “circuitry” refer tophysical electronic components (i.e. hardware) and any software and/orfirmware (“code”) which may configure the hardware, be executed by thehardware, and or otherwise be associated with the hardware. As usedherein, for example, a particular processor and memory may comprisefirst “circuitry” when executing a first one or more lines of code andmay comprise second “circuitry” when executing a second one or morelines of code. As utilized herein, “and/or” means any one or more of theitems in the list joined by “and/or”. As an example, “x and/or y” meansany element of the three-element set {(x), (y), (x, y)}. In other words,“x and/or y” means “one or both of x and y”. As another example, “x, y,and/or z” means any element of the seven-element set {(x), (y), (z), (x,y), (x, z), (y, z), (x, y, z)}. In other words, “x, y and/or z” means“one or more of x, y and z”. As utilized herein, the term “exemplary”means serving as a non-limiting example, instance, or illustration. Asutilized herein, the terms “e.g.,” and “for example” set off lists ofone or more non-limiting examples, instances, or illustrations. Asutilized herein, circuitry is “operable” to perform a function wheneverthe circuitry comprises the necessary hardware and code (if any isnecessary) to perform the function, regardless of whether performance ofthe function is disabled or not enabled (e.g., by a user-configurablesetting, factory trim, etc.).

What is claimed is: 1-22. (canceled)
 23. A method for managing adistributed electronic storage system (DESS), the method comprising:generating, via a first network adapter, an indication of a load on afirst set of one or more resources; receiving, via a network link, anindication of a load on a second set of one or more resources; receivingan indication of a performance over the network link; determining acondition of the DESS based on the performance over the network link,the indication of the load on the first set of one or more resources andthe indication of the load on the second set of one or more resources,wherein a first computing device is operable to determine the conditionas an overperformance condition.
 24. The method of claim 23, wherein thefirst set of one or more resources resides on the first computingdevice.
 25. The method of claim 23, wherein the first set of one or moreresources comprises the first network adapter.
 26. The method of claim23, wherein the first network adapter is operable to store DESS trafficin a virtual container
 27. The method of claim 23, wherein the secondset of one or more resources resides on a second computing device 28.The method of claim 27, wherein the second set of one or more resourcescomprises a second network adapter operable to store DESS traffic. 29.The method of claim 23, wherein the condition is adaptable over time.30. The method of claim 23, wherein the method comprises: reducing anetwork congestion by changing a priority of DESS traffic according tothe condition of the DESS.
 31. The method of claim 23, wherein themethod comprises: adjusting, according to the condition of the DESS, oneor more of: a read batch timing setting, a read batch size setting, awrite batch timing setting, and a write batch size setting.
 32. Themethod of claim 23, wherein the method comprises: in response to thecondition of the DESS being the overperformance condition, performingautomatic provisioning of additional network bandwidth for use by theDESS, wherein the automatic provisioning of additional resourcescomprises one or more of: automatic provisioning of an additionalprocessing core for use by the DESS; automatic provisioning ofadditional memory for use by the DESS; and automatic provisioning ofadditional nonvolatile storage for use by the DESS.
 33. A system formanaging a distributed electronic storage system (DESS), the systemcomprising: a first computing device operable to: generate, via a firstnetwork adapter, an indication of a load on a first set of one or moreresources; receive, via a network link, an indication of a load on asecond set of one or more resources; receive an indication of aperformance over the network link; determine a condition of the DESSbased on the performance over the network link, the indication of theload on the first set of one or more resources and the indication of theload on the second set of one or more resources, wherein the firstcomputing device is operable to determine the condition as anoverperformance condition.
 34. The system of claim 33, wherein the firstset of one or more resources resides on the first computing device. 35.The system of claim 33, wherein the first set of one or more resourcescomprises the first network adapter.
 36. The system of claim 33, whereinthe first network adapter is operable to store DESS traffic in a virtualcontainer
 37. The system of claim 33, wherein the second set of one ormore resources resides on a second computing device
 38. The system ofclaim 37, wherein the second set of one or more resources comprises asecond network adapter operable to store DESS traffic.
 39. The system ofclaim 33, wherein the first computing device is operable adapt thecondition over time.
 40. The system of claim 33, wherein the firstcomputing device is operable to reduce a network congestion by changinga priority of DESS traffic according to the condition of the DESS. 41.The system of claim 33, wherein the first computing device is operableto adjust, according to the condition of the DESS, one or more of: aread batch timing setting, a read batch size setting, a write batchtiming setting, and a write batch size setting.
 42. The system of claim33, wherein the he first computing device is operable to, in response tothe condition of the DESS being the overperformance condition, performautomatic provisioning of additional network bandwidth for use by theDESS, wherein the automatic provisioning of additional resourcescomprises one or more of: automatic provisioning of an additionalprocessing core for use by the DESS; automatic provisioning ofadditional memory for use by the DESS; and automatic provisioning ofadditional nonvolatile storage for use by the DESS.